Exploiting Cross-Linguistic Similarities in Zulu and Xhosa Computational Morphology
نویسندگان
چکیده
This paper investigates the possibilities that cross-linguistic similarities and dissimilarities between related languages offer in terms of bootstrapping a morphological analyser. In this case an existing Zulu morphological analyser prototype (ZulMorph) serves as basis for a Xhosa analyser. The investigation is structured around the morphotactics and the morphophonological alternations of the languages involved. Special attention is given to the so-called “open” class, which represents the word root lexicons for specifically nouns and verbs. The acquisition and coverage of these lexicons prove to be crucial for the success of the analysers under development. The bootstrapped morphological analyser is applied to parallel test corpora and the results are discussed. A variety of cross-linguistic effects is illustrated with examples from the corpora. It is found that bootstrapping morphological analysers for languages that exhibit significant structural and lexical similarities may be fruitfully exploited for developing analysers for lesser-resourced languages.
منابع مشابه
Semi-automated extraction of morphological grammars for Nguni with special reference to Southern Ndebele
A finite-state morphological grammar for Southern Ndebele, a seriously under-resourced language, has been semi-automatically obtained from a general Nguni morphological analyser, which was bootstrapped from a mature hand-written morphological analyser for Zulu. The results for Southern Ndebele morphological analysis, using the Nguni analyser, are surprisingly good, showing that the Nguni langua...
متن کاملValidation of the 10-item Centre for Epidemiological Studies Depression Scale (CES-D-10) in Zulu, Xhosa and Afrikaans populations in South Africa
BACKGROUND The 10-item Centre for Epidemiological Studies Depression Scale (CES-D-10) is a depression screening tool that has been used in the South African National Income Dynamics Study (NIDS), a national household panel study. This screening tool has not yet been validated in South Africa. This study aimed to establish the reliability and validity of the CES-D-10 in Zulu, Xhosa and Afrikaans...
متن کاملExperimental Bootstrapping of Morphological Analysers for Nguni Languages
This paper addresses the experimental bootstrapping of the development of broad-coverage finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by using an existing prototype of a morphological analyser for Zulu. These languages are both morphologically complex and resource-scarce. The research question is whether bootstrapping is feasible across the language boundaries be...
متن کاملLanguage-dependent State Clustering for Multilingual Speech Recognition in Afrikaans, South African English, Xhosa and Zulu
The development of automatic speech recognition systems requires significant quantities of annotated acoustic data. In South Africa, the large number of spoken languages hampers such data collection efforts. Furthermore, code switching and mixing are commonplace since most citizens speak two or more languages fluently. As a result a considerable degree of phonetic cross pollination between lang...
متن کاملSoftware Tools for Morphological Tagging of Zulu Corpora and Lexicon Development
The aim of this paper is to discuss aspects of an on-going project on the development of grammatical and lexical resources for Zulu with sufficient coverage for unrestricted text. We explain how the basic software tools of computational morphology are used in linguistic processing, more specifically for automatic word form recognition and morphological tagging of the growing stock of electronic...
متن کامل